Henan Province
- Oceania > New Zealand (0.04)
- Europe > Germany (0.04)
- Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
- (4 more...)
- Semiconductors & Electronics (0.64)
- Information Technology (0.47)
- Transportation (0.46)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- North America > Dominican Republic (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (18 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Government (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Supplementary Material for Accurate Interpolation for Scattered Data through Hierarchical Residual Refinement Shizhe Ding
In the embedding phase, NIERT uniformly embeds both observed and target points. A learnable mask vector is introduced for target points lacking value data. The NIERT interpolator's core is a Transformer encoder with a masked self-attention mechanism, uniformly encoding observed and The NIERT, a Transformer encoder-only architecture that uniformly encodes observed points and models their correlations, exhibits superior interpolation accuracy. Our proposed architecture, specifically adapted to HINT's overall framework, introduces HINT employs residuals on observed points to estimate residuals on target points. Table 1: Statistics of the interpolation tasks used for training in each dataset.Dataset d Theoretical dataset II: Perlin is another synthetic assembly of interpolation tasks, specifically designed for the numerical interpolation of two-dimensional rough functions.
Low Rank Support Quaternion Matrix Machine
Chen, Wang, Luo, Ziyan, Wang, Shuangyue
Input features are conventionally represented as vectors, matrices, or third order tensors in the real field, for color image classification. Inspired by the success of quaternion data modeling for color images in image recovery and denoising tasks, we propose a novel classification method for color image classification, named as the Low-rank Support Quaternion Matrix Machine (LSQMM), in which the RGB channels are treated as pure quaternions to effectively preserve the intrinsic coupling relationships among channels via the quaternion algebra. For the purpose of promoting low-rank structures resulting from strongly correlated color channels, a quaternion nuclear norm regularization term, serving as a natural extension of the conventional matrix nuclear norm to the quaternion domain, is added to the hinge loss in our LSQMM model. An Alternating Direction Method of Multipliers (ADMM)-based iterative algorithm is designed to effectively resolve the proposed quaternion optimization model. Experimental results on multiple color image classification datasets demonstrate that our proposed classification approach exhibits advantages in classification accuracy, robustness and computational efficiency, compared to several state-of-the-art methods using support vector machines, support matrix machines, and support tensor machines.
- Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.49)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.56)
VoiceCloak: A Multi-Dimensional Defense Framework against Unauthorized Diffusion-based Voice Cloning
Hu, Qianyue, Wu, Junyan, Lu, Wei, Luo, Xiangyang
Diffusion Models (DMs) have achieved remarkable success in realistic voice cloning (VC), while they also increase the risk of malicious misuse. Existing proactive defenses designed for traditional VC models aim to disrupt the forgery process, but they have been proven incompatible with DMs due to the intricate generative mechanisms of diffusion. To bridge this gap, we introduce VoiceCloak, a multi-dimensional proactive defense framework with the goal of obfuscating speaker identity and degrading perceptual quality in potential unauthorized VC. To achieve these goals, we conduct a focused analysis to identify specific vulnerabilities within DMs, allowing VoiceCloak to disrupt the cloning process by introducing adversarial perturbations into the reference audio. Specifically, to obfuscate speaker identity, VoiceCloak first targets speaker identity by distorting representation learning embeddings to maximize identity variation, which is guided by auditory perception principles. Additionally, VoiceCloak disrupts crucial conditional guidance processes, particularly attention context, thereby preventing the alignment of vocal characteristics that are essential for achieving convincing cloning. Then, to address the second objective, VoiceCloak introduces score magnitude amplification to actively steer the reverse trajectory away from the generation of high-quality speech. Noise-guided semantic corruption is further employed to disrupt structural speech semantics captured by DMs, degrading output quality. Extensive experiments highlight VoiceCloak's outstanding defense success rate against unauthorized diffusion-based voice cloning. Audio samples of VoiceCloak are available at https://voice-cloak.github.io/VoiceCloak/.
RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering
Zhang, Rongyang, Huang, Yuqing, Lu, Chengqiang, Wang, Qimeng, Gao, Yan, Wu, Yi, Hu, Yao, Xu, Yin, Wang, Wei, Wang, Hao, Chen, Enhong
In real-world scenarios, providing user queries with visually enhanced responses can considerably benefit understanding and memory, underscoring the great value of interleaved image-text generation. Despite recent progress, like the visual autoregressive model that unifies text and image processing in a single transformer architecture, generating high-quality interleaved content remains challenging. Moreover, evaluations of these interleaved sequences largely remain underexplored, with existing benchmarks often limited by unimodal metrics that inadequately assess the intricacies of combined image-text outputs. To address these issues, we present RAG-IGBench, a thorough benchmark designed specifically to evaluate the task of Interleaved Generation based on Retrieval-Augmented Generation (RAG-IG) in open-domain question answering. RAG-IG integrates multimodal large language models (MLLMs) with retrieval mechanisms, enabling the models to access external image-text information for generating coherent multimodal content. Distinct from previous datasets, RAG-IGBench draws on the latest publicly available content from social platforms and introduces innovative evaluation metrics that measure the quality of text and images, as well as their consistency. Through extensive experiments with state-of-the-art MLLMs (both open-source and proprietary) on RAG-IGBench, we provide an in-depth analysis examining the capabilities and limitations of these models. Additionally, we validate our evaluation metrics by demonstrating their high correlation with human assessments. Models fine-tuned on RAG-IGBench's training set exhibit improved performance across multiple benchmarks, confirming both the quality and practical utility of our dataset. Our benchmark is available at https://github.com/USTC-StarTeam/RAG-IGBench.
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.67)
- Research Report > Promising Solution (0.46)
- Information Technology (0.46)
- Media (0.34)
Memory-DD: A Low-Complexity Dendrite-Inspired Neuron for Temporal Prediction Tasks
Yang, Dongjian, Li, Xiaoyuan, Xi, Chuanmei, Sun, Ye, Liu, Gang
Abstract--Dendrite-inspired neurons have been widely used in tasks such as image classification due to low computational complexity and fast inference speed. T emporal data prediction, as a key machine learning task, plays a key role in real-time scenarios such as sensor data analysis, financial forecasting, and urban traffic management. However, existing dendrite-inspired neurons are mainly designed for static data. Studies on capturing dynamic features and modeling long-term dependencies in temporal sequences remain limited. Efficient architectures specifically designed for temporal sequence prediction are still lacking. In this paper, we propose Memory-DD, a low-complexity dendrite-inspired neuron model. Memory-DD consists of two dendrite-inspired neuron groups that contain no nonlinear activation functions but can still realize nonlinear mappings. Compared with traditional neurons without dendritic functions, Memory-DD requires only two neuron groups to extract logical relationships between features in input sequences. This design effectively captures temporal dependencies and is suitable for both classification and regression tasks on sequence data. Experimental results show that Memory-DD achieves an average accuracy of 89.41% on 18 temporal classification benchmark datasets, outperforming LSTM by 4.25%. On 9 temporal regression datasets, it reaches comparable performance to LSTM, while using only 50% of the parameters and reducing computational complexity (FLOPs) by 27.7%. These results demonstrate that Memory-DD successfully extends the low-complexity advantages of dendrite-inspired neurons to temporal prediction, providing a low-complexity and efficient solution for time-series data processing. ITH the rapid development of information technology, massive temporal sequence data have become the foundation of modern society, ranging from industrial IoT to financial markets.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Asia > China > Shaanxi Province (0.04)
- Asia > China > Henan Province (0.04)
- Banking & Finance (0.68)
- Health & Medicine > Therapeutic Area (0.46)
fMRI2GES: Co-speech Gesture Reconstruction from fMRI Signal with Dual Brain Decoding Alignment
Zhu, Chunzheng, Shao, Jialin, Lin, Jianxin, Wang, Yijun, Wang, Jing, Tang, Jinhui, Li, Kenli
Understanding how the brain responds to external stimuli and decoding this process has been a significant challenge in neuroscience. While previous studies typically concentrated on brain-to-image and brain-to-language reconstruction, our work strives to reconstruct gestures associated with speech stimuli perceived by brain. Unfortunately, the lack of paired \{brain, speech, gesture\} data hinders the deployment of deep learning models for this purpose. In this paper, we introduce a novel approach, \textbf{fMRI2GES}, that allows training of fMRI-to-gesture reconstruction networks on unpaired data using \textbf{Dual Brain Decoding Alignment}. This method relies on two key components: (i) observed texts that elicit brain responses, and (ii) textual descriptions associated with the gestures. Then, instead of training models in a completely supervised manner to find a mapping relationship among the three modalities, we harness an fMRI-to-text model, a text-to-gesture model with paired data and an fMRI-to-gesture model with unpaired data, establishing dual fMRI-to-gesture reconstruction patterns. Afterward, we explicitly align two outputs and train our model in a self-supervision way. We show that our proposed method can reconstruct expressive gestures directly from fMRI recordings. We also investigate fMRI signals from different ROIs in the cortex and how they affect generation results. Overall, we provide new insights into decoding co-speech gestures, thereby advancing our understanding of neuroscience and cognitive science.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Hunan Province > Changsha (0.04)
- Asia > China > Beijing > Beijing (0.04)
- (6 more...)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)